Failure ‐ Awareness and Dynamic Adaptation in Data Scheduling
نویسنده
چکیده
Over the years, scientific applications have become more complex and more data intensive. Especially large scale simulations and scientific experiments in areas such as physics, biology, astronomy and earth sciences demand highly distributed resources to satisfy excessive computational requirements. Increasing data requirements and the distributed nature of the resources made I/O the major bottleneck for end-to-end application performance. Existing systems fail to address issues such as reliability, scalability, and efficiency in dealing with wide area data access, retrieval and processing. In this study, we explore data-intensive distributed computing and study challenges in data placement in distributed environments. After analyzing different application scenarios, we develop new data scheduling methodologies and the key attributes for reliability, adaptability and performance optimization of distributed data placement tasks. Inspired by techniques used in microprocessor and operating system architectures, we extend and adapt some of the known low-level data handling and optimization techniques to distributed computing. Two major contributions of this work include (i) a failure-aware data placement paradigm for increased fault-tolerance, and (ii) adaptive scheduling of data placement tasks for improved end-to-end performance. The failure-aware data placement includes early error detection, error classification, and use of this information in scheduling decisions for the prevention of and recovery from possible future errors. The adaptive scheduling approach includes dynamically tuning data transfer parameters over wide area networks for efficient utilization of available network capacity and optimized end-to-end data transfer performance.
منابع مشابه
Task Scheduling Algorithm Using Covariance Matrix Adaptation Evolution Strategy (CMA-ES) in Cloud Computing
The cloud computing is considered as a computational model which provides the uses requests with resources upon any demand and needs.The need for planning the scheduling of the user's jobs has emerged as an important challenge in the field of cloud computing. It is mainly due to several reasons, including ever-increasing advancements of information technology and an increase of applications and...
متن کاملReliability-based maintenance scheduling of powered supports in Tabas mechanized coal mine
Utilizing the gathered failure data and failure interval data from Tabas coal mine in two years, this paper discusses the reliability of powered supports. The data sets were investigated using statistical procedures and in two levels: the existence of trend and serial correlation. The results show that the powered supports follow the Gamma reliability function. The reliability of the machine de...
متن کاملEffects of Aerobic and Resistance Exercise Program on Physical Adaptation in the Elderly Men Patients with Heart Failure
Introduction: Diminished physiological tolerance induced heart failure among the elderly patients limit doing their daily activities. Exercising as a non-medical intervention is not usually mentioned for daily activities promotion. The current study was conducted aiming to explore the effects of aerobic and resistance exercise program on physical adaptation in the elderly men patients with hear...
متن کاملAssessment of Awareness and Adaptation to Climate Change among Rainfed Farmers in Um Alqora Locality, Gezira State, Sudan
Climate change represents the major challenge to Sudan agricultural production, economics and food security. Changes in temperature, rainfalls, water availability, increased outbreak of pest and diseases, land degradation, soil erosion, shrinking of grazing and cultivate areas, ongoing desertification and the other aspects of climate change have direct significant impact on agricultural product...
متن کاملStability Assessment Metamorphic Approach (SAMA) for Effective Scheduling based on Fault Tolerance in Computational Grid
Grid Computing allows coordinated and controlled resource sharing and problem solving in multi-institutional, dynamic virtual organizations. Moreover, fault tolerance and task scheduling is an important issue for large scale computational grid because of its unreliable nature of grid resources. Commonly exploited techniques to realize fault tolerance is periodic Checkpointing that periodically ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008